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Abstract 

We will outline novel approaches to derive model invariants for 
hidden Markov and related models. These approaches are based on 
a theoretical framework that arises from viewing random processes as 
elements of the vector space of string functions. Theorems available 
from that framework then give rise to novel ideas to obtain model 
invariants for hidden Markov and related models. 



1 Introduction 

In the following, we will outline how to obtain invariants for hidden Markov 
and related models, based on an approach which, in its most prevalent ap- 
plication, served to solve the identifiability problem for hidden Markov pro- 
cesses (HMPs) in 1992 [15] . Some of its foundations had been layed in the 
late 50's and early 60's in order to get a grasp of problems related to that of 
identifying HMPs j5j [HI El [3, El EG] • The approach can be viewed as being 
centered around the definition of finite-dimensional discrete-time, discrete- 
valued stochastic processes (referred to as discrete random processes in the 
following)0. It Examples of finite-dimensional discrete random processes 
other than HMPs are quantum random walks (QRWs). QRWs have been 
brought up mostly to emulate Markov chain related algorithms (e.g. Markov 
Chain Monte Carlo techniques) on quantum computers pQ. 

In the following, we will introduce finite-dimensional string functions and 
formally describe how to view discrete random processes as string functions. 



In the literature, finite-dimensional discrete random processes are alternatively re- 
ferred to as finitary [12j or linearly dependent [13j processes. In the following, we will stay 
with the term finite-dimensional (discrete random processes) in accordance with the latest 
contributions on the topic [HI ITuI ITS ! IT3 1 120] 
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We will further provide helpful characterizations and related theorems. In 
sec. E]we will determine polynomials that generate the ideal of the invariants 
of the finite-dimensional model, in the usual sense of algebraic statistics. In 
sec. HI we will prove a theorem from which, as a corollary, one obtains a 
proof of conjecture 11.9 in [3J. This corollary will be listed in sec. |5] where 
we will draw the connections to the hidden Markov model in more detail. In 
sec. M we will show how to obtain invariants for the Markov models, based 
on the results of the preceding sections. In sec. [7] we will briefly demonstrate 
that trace algebras, as well, can be viewed as certain finite-dimensional string 
functions. Invariants of the finite-dimensional model are relatively easy to 
obtain 



2 Preliminaries: String Functions 

Detailed proofs and explanations of the following results can be found from 
[TS] . Let E* = U n >oE n denote the set of all strings of finite length over the 
finite alphabet £ where the word □ G S° of length |D| = is the empty 
string. Single letters are usually denoted by a, b whereas strings of arbitrary 
length are denoted by v,w (for example, v = a\...a n G £ n ,u> = bx...b m G S m 
where a,i,bj G £). We have the concatenation operation: 

wG£">G£ n w6E m+n . (1) 

We denote the length of v G S n by \v \ = n. We now direct our attention to 
real-valued string functions 

p: £* — >R (2) 

and further to MP , that is, to the real vector space of string functions over 
S. The notation p is due to that discrete random processes will be viewed 
as string functions, which will be described in the following. 

2.1 Discrete Random Processes as String Functions 

Given a discrete random process (X t ) with values in the alphabet S, the 
prescription 

p x (v = ai...a n ) = P({Xi = ax, ...,X n = a n }) 
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gives rise to a string function p x associated with the random process. 
Px(ai--.a n ) then just is the probability that the associated random process 
emits the string a\...a n at periods t = l,...,n. String functions associated 
with discrete random processes can be characterized as follows. 

Theorem 2.1. A string function p : S* — > R is associated with a discrete 



random process iff the following conditions hold, 
(a) p(v) > for all v G £*. 

0>) T, a exP( va ) = P( v ) f or al1 v eZ*. 
(c) p{U) = 1. 
Note that (b) in combination with (c) implies 



Definition 2.2. A string function p : S* — > K. is called 

• stochastic string function (SSF) if it is associated with a discrete ran- 
dom process, that is, iff (a), (b) and (c) of theorem 12. II apply. 

• unconstrained stochastic string function (USSF) if only (a) and (b) ap- 
ply (in accordance with the terminology of [16]) and 

• generalized unconstrained stochastic string function (GUSSF) if only 
(b) applies. 

In the following, the terms (generalized unconstrained) random process 
and (GU)SSF will be used interchangeably. Furthermore, note that p(a\...a n ) 
just is a different notation for p ai ... ai which was used in |16j. 

2.2 Dimension of String Functions 

The following definitions are fundamental for this work. 
Definition 2.3. Let p : S* — >• R be a string function over E. Then 




(3) 



V p := [p(wv) VtW& ] G R" 



S*xS* 



(4) 
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is called the Hankel matrix of p (also called prediction matrix in case of a 
SSFp). We define 

dimp := rk V p (5) 

to be the dimension of p. In case of dimp < oo the string function p is said 
to be finite- dimensional. 

Example 2.4. Let p : S* — > R be a string function over the binary alphabet 

£ = {o,i}. 

\ 



fp(D) 


p(0) 


p(l) 


p(00) 


p(01) 


p(10) 


p(ll) 


p(0) 


p(00) 


p(10) 


p(000) 


p(010) 


p(100) 


p(110) 


p(l) 


p(01) 


p(ll) 


p(001) 


p(011) 


p(101) 


p(lll) 


p(00) 


p(000) 


p(100) 


p(0000) 


p(0100) 


p(1000) 


p(1100) 


Koi) 


p(001) 


p(101) 


p(0001) 


p(0101) 


p(1001) 


p(1101) 


p(10) 


p(010) 


p(110) 


p(0010) 


p(0110) 


p(1010) 


p(1110) 


p(ii) 


p(011) 


p(lll) 


p(0011) 


p(0111) 


p(1011) 


p(llll) 



V 



then is the Hankel matrix where strings of finite length have been ordered 
lexicographically. Note that within a row values refer to strings that have 
the same suffix whereas within a column values refer to strings that have the 
same prefix. See also [1] for an example of a Hankel matrix. 

The following characterization of finite-dimensional string functions is the 
major source of motivation for this work. 

Theorem 2.5 ([131 HH HHj) - Let p : S* — > R be a string function. Then the 
following conditions are equivalent. 

(i) p has dimension at most d. 

(ii) There exist vectors x,y G R d as well as matrices T a £ M. dxd for all 
a G £ such that 

Wv G S* : p(v = ai...a n ) = (y\T an ...T ai \x). (6) 

Fully elaborated proofs of theorem [53] can be found in [THEE]. Note that 
([HD can be transformed to 

p(v)=trT an ...T ai C (7) 

where C = xy T G R dxd . 
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Example 2.6. The most prominent example for finite-dimensional SSFs are 
hidden Markov chains. Let p be an SSF associated with a hidden Markov 
chain on d hidden states and output alphabet E. Let A = (P(z — > j))i<i,j<d 
be the transition probability matrix, Ei a , 1 < i < d, a G E be the emission 
probabilities and 7r be the initial probability distribution. We define 

O a :=dia,g(E ia ,i = l,...,d)eR dxd (8) 

and further 

T a := A T O a G R dxd . 

The T a together with y := (1,...,1) G x := ir G M. d then provide a 
representation corresponding to (jSJ). 

We will be particularly interested in finite-dimensional GUSSFs (we recall 
definition 12.21) . The following theorem provides a characterization. 

Theorem 2.7. Let p : S* — > K be a string function such that dimp < d. 
Then the following two statments are equivalent: 

(i) p is a GUSSF, that is, Ylia&Y,P( va ) = P( v ) f or a ^ v e 

(ii) There exist vectors x,y G M. d as well as matrices T a G M. dxd for all 
a G S such that 

\/v G £* : p(v = a 1 ...a n ) = (y\T an ...T ai \x). (9) 

as we// as 

y T Y. T « = y T ( 10 ) 

translating to that y is an eigenvector of the eigenvalue 1 of the trans- 
pose 0/EaeE T a- 

In the following, we will write 

% := T an ...T ai ,T w = T bm ...T bl (11) 
in case of v = ai...a n G £ n ,u> = b\...b m G S m . 
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Proof. The obvious direction is : 

^2p(va) = ^2(y\T a T v \x) = (y\^2T a \T v x) 

aGS aeS aGS (12) 

fl? (y|T«|ar) = 

For "=>•", let d* := dimp < d be the actual dimension of p. According to 
theorem 12.51 we find matrices T a G R d * xa! *, a G S and vectors x, y G M '* such 
that 

V^f rf *: = (13) 

In case of d* = d we will have proven the claim by putting T a := T a ,x = 
x,y = y. In case of d* < d we will obtain suitable matrices T a G M. dxd and 
vectors x,y G M d by putting 

/ m n I (?o)ii) ^j, % 1 < i, j < d* , . 

r aAj ' • (14) 

From theorem 12.51 we obtain matrices T a G M, d * xd * ,a G S and vectors x, y G 
JR d * such that 

p(v = Oi...a n ) = (y|T„|x). (15) 
Condition (z) then implies that 

It remains to show that 

spaa{T v x\v G £*} = R d * . (17) 
However, assuming the contrary would lead to the contradiction 

d* = dimp = rk \p(wv)] VtV}& p = rk [{y\T v T w \x)] v>we ^ 

< dimspanjT^x | w G £*} < d* . o (18) 

Matrices T a can be computationally determined according to a procedure 
which we will describe in the following. Therefore, for a string function p, 
we introduce the notation 

p v : £* -> R p": £* -> R , . 

, . resp. , . . (19) 

10 I — > p[wv) V I— > p(wv) 
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That is, the p v resp. p v are the row resp. column vectors of the Hankel matrix 
V p . These are string functions in their own right. Note that p\j = p n = p. 
Moreover, note that in case of a stochastic process p s.t. p(w = bi...b m ) ^ 
it holds that 



Therefore, just the discrete random process being governed by the 

probabilities of p conditioned on that w has already been emitted. 

The following is a generic algorithmic strategy to infer matrices T a G M. dxd 
and x, y G M. d corresponding to ([H]) from a finite-dimensional Hankel matrix. 
At this point, the algorithm needs the entire string function p as an input. 
We will explain how to obtain a practical version of this generic strategy 
later in this section. 

Algorithm 2.8. 

Input: A string function p such that dimp = d < oo. 

Output: Matrices T a G M. dxd , a G X and vectors x, y G W 1 such that 



1. Determine words v±,...,Vd resp. Wi,...,Wd such that the f Vi resp. the 
g Wj span the row resp. column space of V p . Hence the matrix 



has full rank d = dim p. 
2. Denote by resp. the z-th row resp. the j-th column of V and 




P({X lf+1 = a u ...,X v+l = aj | {X l = h, ...,X V = &,,}). (20) 



p(v = ai...a n ) = tr T an ...T ai xy T . 



(21) 



V := \p{wjVi)]x<i,j<d 



(22) 



define 



x = (xi, ...,x d ) T := (p(vi), ...,p(v d )) T 
and y = (yi, y d ) G M. d such that 



(23) 




(24) 



i=l 
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which can be done as pu = p (the uppermost row of the Hankel matrix) 
is linearly dependent of the p Vi (the basis of the row space of the Hankel 
matrix) . 

3. For each a G S, determine matrices 

W a := [pK-a^)]i<M<<*- G M dxd . (25) 

4. One can then show that x, y and T a := (WaV' 1 ), a G S are as needed 
for theorem 12.51 

Clearly, the driving question behind algorithm 12.81 is its practicability. A 
first clue to this is the following theorem. Therefore, we set 

to be the set of all strings of length at most n and 

V P ,n,m := \p{wv)]\ v \< nM < m G M s -" xS - m . (27) 

to be the finite minor of the Hankel matrix referring to row resp. column 
vectors indexed by strings of length at most n resp. m. 

Theorem 2.9. Let p : E — > K 6e a string function such that dimp < d. 
Then it holds that 

dimp = rk V p ,d-i,d-i. (28) 

This means that, given an upper bound d on the dimension of p, the di- 
mension of p can be determined by inspecting the finite-dimensional matrix 
V p> d-i,d-i- See [18J for a proof. Note, however, that the size of V P) d-i,d-i is 
exponential in d such that naive approaches to determining V (I22p would re- 
sult in exponential runtime. The final clue to the practicability of algorithm 
12.81 is an efficient algorithm to determine V which has recently been pre- 
sented [19] . The algorithm applies in case one is provided with an arbitrary 
generating system of the row or column space of V p . Corresponding generat- 
ing systems emerge naturally for finite-dimensional processes of interests, in 
particular for hidden Markov processes and also for quantum random walks. 

A consequence of theorem 12.91 is 
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Theorem 2.10 ([IB])- Let p be a string function such that dimp < d. Then 
p is uniquely determined by the values 



p{v) 



v\ < 2d - I. 



(29) 



Proof Sketch. The idea is, given two string functions p\,p2 where 
dim pi , dim P2 < d which coincide on strings of length up to 2d — 1, to 
determine matrices T a and vectors x, y as in theorem 12.51 according to 
algorithm 12.81 for both p\ and p2- Thanks to theorem I2.9[ in algorithm 12. 8[ 
V can be determined by inspecting values of p at strings of length at most 
2d — 2 in Vp^-i4-i and, subsequently, by inspecting strings of length at most 
2{d— 1) + 1 = 2<f — lin order to obtain the W a . As p\ and P2 coincide on 
strings of length 2d—l, this will result in the same V and W a . Hence p\ = p2-<> 

The following corollary is an obvious consequence of theorem 12.101 due 
to property (b) from theorem 12.11 However, it had been well-known already 
before. See e.g. [151 H] an d the references therein. 

Corollary 2.11 ([TBI H]). A GUSSF p such that dimp < d is uniquely de- 
termined by the values 



In other words, a discrete random process whose dimension can be upper 
bounded by d is uniquely determined by its probability distribution over the 
strings of length 2d — 1 . 

Remark 2.12. Note that for a string function p with dimp < d < oo, rows 
and columns of the Hankel matrix indexed by strings of length at least d must 
necessarily be linearly dependent of their counterparts referring to strings of 
length at most d— 1. These observations are crucial for the core result of the 
following section. 

3 Finite-Dimensional Models 

Finite-dimensional models over S are defined to be the polynomial maps 



p(v) 



v\=2d- 1. 



(30) 



&n,d • 



gd £- (£\Y,\d 2 +2d 

{{T a )ae?),x,y) 



C |S|« 

((y\ T a n ---T ai \x)) 



(31) 
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where 

{{T a ) aeX ),x,y)eS d y T ^ T a = y T . (32) 

According to theorem 12.71 S d comprises precisely the parameterizations of 
the generalized unconstrained random processes of dimension at most d. Ob- 
viously, 

gd ~ (£(|£|-l)d 2 +d(d-l)+2d ^gg-j 

Therefore, the Zariski closure of image (g n ,d) is an irreducible variety. 

In the following, we will make use of the polynomial map (131 j) to derive a 
set-theoretic theorem with a strong view towards the invariants of the Zariski 
closure of the image of g n ,d- In case of n > 2d — 1, invariants for the image 
of gn.d can be derived by inspection of the Hankel matrix. As in (1271) . let 
'Pp,n,m be the partial Hankel matrix that is filled with all values p(wv) such 
that \v\ < n, \w\ < m. 

Theorem 3.1. Let n > 2d — 1 and (p(v))«e£ n be an (unconstrained) proba- 
bility distribution. Then it holds that 

(p(v))ve^ e image (g M ) 
if and only if the following two conditions apply where, in case of \u\ < n, 

p( u ) = P( uv )- ( 34 ) 

(a) 

det [p{wjVi)]i<ij< d+ i = (35) 

for all choices of words Vi, ...i^+i, itfi, ■■■Wd+\ °f length at most d — 1, 
which can be equivalently put as 

rk V P:d -i,d-i < d (36) 

(b) 

rk -p Pi[ n liL nj = rk Vp^^m = rk V p>d -i ]d -i (37) 
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(1571) states that rows resp. columns in ^p,ff],LfJ an d ^p,L 2 J.r|l referring 
to row strings u resp. column strings w where |t>|, |u>| > d are linearly de- 
pendent of their counterparts in T-p^nq^nj and "Pp, L— J , T— 1 ^ na ^ re f er to row 
resp. column strings of length at most d — 1. 

Proo/. «=*.'': Let (p(u)) ^eS" be in the image of g n <$. Theorem 12.51 states 
that the Hankel matrix P p of p has rank at most d. This implies (a) as it 
just expresses that some Hankel matrix minors of size d + 1 do not have full 
rank. 

Theorem 12.91 then states that bases of the row resp. the column space 
of V p can be obtained by inspecting row resp. column vectors referring to 
strings of length at most d — 1 which implies (b) . 

"••<=": Let (p(u)) ue s n s.t. (a), (6) apply. In order to prove that (p(u)) ue ^n G 
image g n ^, we have to provide a parameterization ((T a ) ae z,x,y) G 5 d such 
that 

p(u = ai...a n ) = 2/ T 7"a n ...T ai :c (38) 

for all strings u G S n . Therefore, we will provide a parameterization 
((T a ) aeS ,x )? /)GCl s l d2 + M such that 

p(u = oi...o fc ) = y T T ak ...T ai x (39) 

for all strings w such that |it| < n where p(it) is defined according to (|34l) 
in case of \u\ < n. By this definition of p(u), \u\ < n, it is straightforward 
to show that ((T a ) ae s, x, y) G S d which completes the proof. Furthermore, 
note that it suffices to provide a parameterization ((T ) ae s, x, y) G S d * for 
arbitrary d* < d since, in case of d* < d, we extend the T a as well as x, y by 
zero entries to obtain a (i-dimensional parametrization from S d . Combining 
these facts, we have to show that, for suitable d* < d, there are matrices 
T a G R d * xd * and vectors x, y G R d * such that holds. 

We obtain the desired parameterization ((T a ) ae z,, x,y) according to the 
ideas of algorithm 12. 81 First, determine strings vi,...,Vd* and W\, Wd* of 
length at most d — 1 such that 

V := [p(w/yi)]i<ij<d* (40) 

has full rank d* := rk V P) d-i,d-\ < ^- We define 

x = (xi, ...,x d .) T := (p(t>i), •••,P(^*)) T ( 41 ) 
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and y = (yi, ...,yd*) £ M d * such that 

d* 

(p(vi),...,p(v d *)) = ^2viVi 



(42) 



i=l 



where Vi = (j>(viWi), ...,p(viWd*) T is the z-th row of V which can be done since 
the uppermost row of V Pjn ,d-i is linearly dependent of the rows referring to 
the strings Vi. Furthermore, for each a £ S, we determine matrices 



W a := [p(tOja^)]i<ij<d*. e 



od*xd* 



(43) 



Note that probabilities in W a may refer to strings of length up to 2d — 1 
which establishes the necessity of the assumption n > 2d — 1. We then claim 
that defining 

T a := WaV- 1 (44) 



gives rise to the desired parametrization in terms of (j39|) . We will obtain an 
easy proof of this claim by three elementary lemmata. 

Lemma 3.2. For all v,w £ S* sitc/i t/iot < |~|~| (*r„ = T ak ...T ai , v = 
ai...a k £ T, k ): 



T 



/ p(wvi) \ 




( p(wvvi) \ 


\p(wv d *) J 




\p(wvvd*)J 



(45) 



Proof of lemma [32J Note first that \v { \ < d - 1 < ^1 < | which 
implies < n. As (p(wvi), ...,p(wVd*)) T is contained in the column 

space of V it suffices to show the statement for w = Wj. We do this by 
induction on \v\: 
\v\ = 1: 





f p(WjVi) ^ 




/ p(WjVi) \ 




/ p(WjdVi) \ 


T 








= w aej = 






[piWjVd*)) 




\p(WjV d *)) 




\p(wjavd*)) 



(46) 
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\v\ — > lul + 1: Let v = av where a G E. 





/ p(lVjVi) ^ 




( p(lVjVi 


)^ 


\v\ = l rp 
— J- V 


/ p(WjdVi 








— T T 

± v J - a 
















\p(WjV d * 


)) 




\p(wjav d * 


V 



(*) 



where (*) follows from the induction hypothesis. 



/ p(vjjvavi 




' p(WjVVi 


)\ 


\p(wjvav d * 


V 


\p(WjVV d * 


V 



(47) 



Lemma 3.3. For all v,w G X* such that \w\,\v\ < ["^~|,|tyu| < n (T v 
T ak ...T ai ,v = a x ...a k G Y* k ): 



y T v 



I p(wvi) \ 
\p(wv d *) J 



p(wv). 



(48) 



Proof of lemma 13.31 Note that the columns in T^ij-j.p] resp. Ppjn-j^nj 
referring to w is contained in the span of the columns referring to the w/s, 
according to the choice of the Wj. Therefore, it suffices to show the statement 
for w = Wj. We do this by induction on \v\: 
\v\ = (v = D,T n = Id): 



y T T n 



I p(WjVi) \ 
\p(WjV d *)J 



( p(WjVi) \ 
\p{WjV d ,)j 



p{Wj) 



(49) 



follows from the choice of y. 

\v\ — > \v\ + 1: Let v = av, a G E. 



y Tv 



/ p(lVjVi 


)\ 




/ p(WjVi 


)\ 






= y T T v T a 








)) 




\p(WjV d * 


V 



L .m v T T 

y v 



Wjavi 



\ 



(50) 



(*) 



piwav) = p(wv) 



\p(wjav d *)J 
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where (*) follows from the induction hypothesis. 



Proof of theorem 13.11 (cont.): Let u G S* such that \u\ < n. Split 

u = wv into two strings w,v such that \w\, \v\ < |"|~|. We compute 



y T u x y T v T w x y T V T W 



( P(vi) \ ( p(Dui) \ 

y T v T w y T V T W I • 

\p(E\v d *)j 



l. !3.2l iwDi<r ? r 



\P(Vd*)J 

( p(wvi) \ 



(51) 



L.IO 



p(wv) = p{u) 



\p(wv d *) J 



where we have replaced v resp. w of lemma 13721 by w resp. □ here in order to 
obtain the fourth equation. o 



Due to theorem 13.11 invariants that are induced by conditions (a) and 
(b) fully describe the finite-dimensional model g nj d for n > 2d — 1, hence 
generate the ideal of model invariants. 

Example 3.4. Consider 



P,4,2 



fp(D) 


p(0) 


p(l) 


p(00) 


p(01) 


p(10) 


p(ll) \ 


p(0) 


p(00) 


p(10) 


p(000) 


p(010) 


p(100) 


p(110) 


p(l) 


p(01) 


p(ll) 


p(001) 


p(011) 


p(101) 


p(lll) 


p(00) 


p(000) 


p(100) 


p(0000) 


p(0100) 


p(1000) 


p(1100) 


p(01) 


p(001) 


p(101) 


p(0001) 


p(0101) 


p(1001) 


p(1101) 


p(10) 


p(010) 


p(110) 


p(0010) 


p(0110) 


p(1010) 


p(1110) 


\p(ll) 


p(011) 


p(lll) 


p(0011) 


p(0111) 


p(1011) 


p(llll)/ 



where S = {0, 1}. Condition (a) then translates to the only equation 

(p{U) p(0) p(l)\ 
det p(0) p(00) p(10) = 0. 
\p(l) p(01) p(ll)/ 



14 



The column conditions in (6) can be stated as follows: 



( p(oo) \ 




/ p(01) \ 




/ p(10) \ 




( p(ll) \ 


p(000) 




p(010) 




p(100) 




p(110) 


p(001) 




p(011) 




p(101) 




p(lll) 


p(0000) 




p(0100) 




p(1000) 




p(1100) 


p(0001) 




p(0101) 




p(1001) 




p(1101) 


p(0010) 




p(0110) 




p(1010) 




p(1110) 










\p(mi)) 




U im v 



/p(D)\ 




/ p(0) \ 




/ p(l) \ 


p(0) 




p(00) 




p(10) 






p(01) 




p(ll) 


p(00) 


5 


p(000) 




p(100) 


p(01) 




p(OOl) 




p(101) 


p(10) 




p(010) 




p(110) 













The row conditions are completely analogous to the column conditions. 
Clearly, invariants induced by (6) refer to polynomial rings 

K[Xij, Yi, 1 < % < M, 1 < j < N] (52) 

and the smallest varieties therein that contain all points Xij,yi such that 
(yi,...,yM) is linearly dependent of (xu, xmi), (%in, %mn)- The 
Zariski closure of the image of g Ht d being an irreducible variety leads us 
to the following conjecture. 

Conjecture 3.5. Let n > 2d — 1. 

(p(u))«eE» e image (g M ) 

z/ and on/y z/ 

det [p(wjUi)]i<ij<d+i = 
/or a// choices of words v±, ...Vd+i,wi, ...wa+i such that \uijVi\ < n. 

Remark 3.6. The finite-dimensional models have to be handled with certain 
care. Even if a (unconstrained) probability distribution is in the image of 
gn t d the finite-dimensional string function giving rise to it might not be an 
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(unconstrained) stochastic process, meaning that the string function is not 
necessarily non-negative, since values referring to longer strings as computed 
according to ([6]) might be negative. It is one of the big open problems of 
the theory of finite-dimensional processes how to algorithmically determine 
whether a set of matrices as in (jfJJ) gives rise to a non-negative string function. 

4 String Length Complexity 

In this section, we will prove a set-theoretical theorem an ideal-theoretical 
counterpart of which would yield, as a corollary, a proof of conjecture 11.9, 
[3|. The theorem may be of interest in its own right, as the assumptions to 
be met by the models under considerations are fairly mild. 

Roughly speaking, an ideal-theoretical extension of the theorem would be 
about how to lift sets of generators for models describing distributions over 
strings of length n to generators for distributions over strings of length n+ 1, 
given that n is greater than the string length complexity of the underlying 
models. 

In the following, 

M C M s * (53) 

is a class of USSFs. 

Definition 4.1. Let At C IR S * be a class of USSFs. We define the string 
length complexity of At to be 

SLC (At) := 

inf{N e N | p 1} p 2 e At : (pi)|E» = (pa)|E« =>• Pi = M- (54) 

That is, members of Ai are uniquely determined by their distributions over 
strings of length SLC (Ai). 

Given a class of USSFs, let 

M n := {(p(v)) ve ^ \p e M} (55) 

be the set of distributions over strings of length n that are induced by the 
members of Ai. In case of SLC (At) = n the map 

7T S n : At — ► Atn ^ 56 ^ 

p i-> P|e« = (p(v))ve^ 

is one-to-one. 
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Theorem 4.2. Let M. be a class of unconstrained random processes such 
that 



(i) 



SLC (M) < n- 1< oo. 



(57) 



(ii) 



p G M 



Va G E : p a eM. 



(58) 



T/ien ZioZds i/iai 



(p(«),iie E n+1 ) g .M 



n+1 




(p(af),w G E n ) G .M„ Va G E 

(j)(«),«e^)eW„ 



(59) 



where p(v) = ^ ag sP(va). 

Remark 4.3. Theorem 14.21 is meant to be a first step to obtain an analo- 
gous theorem resulting from replacing M. ni M. n+ \ by their Zariski closures 
M. n . M. n +i- Generators for the ideal of invariants of -M n+ i, given generators 
for the ideal of invariants of A4 n , could be obtained by the following idea. 
If h G C[X v ,v G E ra ] is one of the generators for M. n where the X v are 
indeterminates for the probabilities p(v), v G E n , one obtains |E| + 1 genera- 
tors for M. n+ i by replacing the indeterminates X v , v G E n by indeterminates 
X av , v G E n for all a G E which results in new generators 



as well as replacing each X v by the polynomials J2 a X va G C[X u ,u G E n+1 ] 
resulting in another generator 



The theorem would state that the generators obtained by this procedure 
generate the ideal of invariants of A4 n +i- 

Note that in particular the maximum degree of the generators of M. n +i 
would be at most that of M. n . 

Proof. "=^": From f[58"j) we obtain that (p a (v),v G E n ) G M. n for each 
a G E. The second part is just the trivial observation that (p(u),u G E n+1 ) G 



h a G C[X av , v G E n ] C C[X U , m G E 



m+l 



(60) 



h+ G C[J2 Xva, v ^ E n ] C C[X U , u G E™ +1 ]. 



(61) 



a 
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M n +i implies (p(v),v G E n ) G M n . 

"<=": From the second case in (159]) we obtain that (p(v), v G E n ) G A4 n . As 
elements of A^ are uniquely determined by their values for strings of length 
at least m and n > m + 1 we obtain a USSF p e M. such that 

p( v ) = p( v ) for all v G E n . (62) 

It remains to show that also 

p(w) = p( w ) for all w G E n+1 (63) 

which amounts to showing that 

p(av) = p(av) = p a (v) for all (a,v) 6 S x S™, (64) 

We further observe that 

( P a (v),veZ n )eM n (65) 

for all a G E, because of n > m + 1 > m, implies the existence of a unique 
q a G M s.t. 

g a (i>) = £> a (i>) for all v , |i>| < n. (66) 

As p € A4, we have that p" 6 M for all a G E, due to (1581) . Moreover, for 
it G E"-\ 



p a (u)=p(au) = p(au) = q a (u). (67) 

As n — 1 > m and p a A4 and q a M. coincide on strings of length n — 1 > m, 
we obtain 

p a = q a (68) 
because of (i). We finally compute 

p(av) = p a (v) ® g» *P p» = p(au) (69) 

which establishes (EH). o 
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4.1 Finite-Dimensional Models 

Theorem 14.21 applies for the finite-dimensional models. Eq. [57] is established 
by theorem 12.101 in subsection 12.21 whose statement is that finite-dimensional 
processes p of dimension at most d are uniquely determined by the values 
p(v), \v\ = 2d — 1. 

In terms of the language introduced here, we can restate theorem 12.101 as 
follows. 

Theorem 4.4. Let 

M d ■= {p G K E *; | p is USSF and dimp < d} 

be the class of unconstrained processes of dimension at most d. Then it holds 
that 

SLC (M d ) = 2d-l. 
Furthermore observe that 

(p a ) w (v) = p a (wv) = p(awv) = p aw (v) (70) 
for all a 6 E,u,w 6 E* which translates to 

(p°)« = p aw . (71) 
Hence the column space of V p °. is contained in that of V p which yields 

dimp a < dimp (72) 

as dimp is just the dimension of the column space of V p . 

This observation in combination with theorem 14.41 make the assumptions 
of theorem 14.21 hold for Aid, which yields the following corollary. 



Corollary 4.5. Let n > 2d. Then it holds that 
[p{u),u E S n+1 ) G image g n+M 



(p(av),v G S n ) G image g n)d Va G S 
{p{v),v G S n ) G image g n4 

(73) 



Again, an analogous ideal-theoretical result referring to the Zariski clo- 
sures of image g n> d, image g„+i,d would yield that the maximum degree of the 
generators would not increase for n > 2d. 
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5 Hidden Markov Models 



In the following, let 




(tr T an ...T ai x(l 



CM 



(74) 



where A and the O a as in example !2.6l be the polynomial map associated with 
the unconstrained (constrained if and only if Y^i=i x i — 1) hidden Markov 
model referring to hidden Markov models acting on I hidden states and dis- 
tributions over strings of length n, as described in [16] . 

The following theorem of Heller resulted from the attempts set off in the 
late 50's [TTJ El EJ Ej to give novel characterizations of hidden Markov pro- 
cesses. Many of those results are based on the idea that HMPs have finite 
dimension, which was noticed earlier in that series of papers without explic- 
itly stating it. We give a version of Heller's theorem that is adapted to the 
language in use here. Heller's version is formulated in the language of homo- 
logical algebra — without string functions and Hankel matrices. In his paper, 
discrete random processes are viewed as modules over certain rings. This 
language later has never been used in the theory of stochastic processes or 
related areas, probably as the required amount of prior knowledge unfamiliar 
to statisticians and probabilists is high. In the following we define 



to be the column space of the Hankel matrix V p of a string function p. 

Theorem 5.1 (Heller, 1965). A string function p : X* — > R is associated with 
a (unconstrained) hidden Markov process if and only if there are (U)SSFs 
Pi eC p ,i = 1, s.t 

(a) p G cone {pi \ i — 1, I}, 

(b) Vw G £* : (pi) w G cone {pi\i = 1, I}. 

Note first that this again points out that hidden Markov processes p are 
finite-dimensional as C p C span-jjOj \i = 1, ...,/} hence dimp < I. Note fur- 
ther that (a) in combination with (b) implies that p v G cone {pi} for all 
dGS* which renders cone {pi} to be full-dimensional. It is closed due to be- 
ing polyhedral and pointed due to being generated by SSFs which are strictly 



C p := span{p™ | w G £*} 



(75) 
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positive string functions. Collecting properties results in cone {p^} being a 
proper, polyhedral cone. 

Given a hidden Markov process p, the pi can be obtained as the random 
processes starting from the hidden states (i.e. having initial probability distri- 
bution 6j). The other direction requires more work. A translation of Heller's 
proof [T2] to the language of string functions can be found in [TTJ . A rather 
straightforward consequence of Heller's theorem is the following corollary. 

Corollary 5.2. Let p be a USSF of dimension of at most 2. Then p is 
associated with an unconstrained hidden Markov process acting on 2 hidden 
states. 

Proof Sketch: As all p w > the cone generated by all column vectors 

cone {p w | w E £*} (76) 

is pointed hence its closure is generated by its extremal rays. In two dimen- 
sions this is equivalent to the closure of cone {p w \ v G £*} being polyhedral. 
It's a routine exercise to check for the assumptions of Heller's theorem to 
hold for this cone. o 

One might be tempted to infer that the ideal of model invariants of f n> 2 
can be computed by computing the invariants of the 2-dimensional model, as 
provided by theorem 13.11 However, a 2-dimensional process need not be as- 
sociated with a hidden Markov process acting on 2 hidden states. According 
to the proof of theorem 15.11 one might need up to 2|S| many hidden states 
to describe an arbitrary 2-dimensional process by means of a hidden Markov 
parameterization. 



5.1 Degree of Invariants 

Heller's theorem gives rise to an application of theorem !4.2l to hidden Markov 
processes where n > 21. Assumption (i) of theorem 14.21 is met since hidden 
Markov processes on / hidden states, as finite-dimensional random processes 
of dimension < I, are determined by their distributions over the strings of 
length 21 — 1. Assumption [ii) is met due to Heller's theoremJl The only 

2 Proofs for this can also be formulated in terms of the hidden Markov processes' pa- 
rametcrizations. However, such proofs are lengthy and technical exercises. 
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thing one has to be aware of is that the dimension of the column space of V p a 
can be lower than that of V p itself. In this case, one obtains the necessary 
cone generators by projecting C p onto C p a (we recall that C p a c C p ). In sum, 
the class of unconstrained hidden Markov processes meet the assumptions of 
theorem 14.21 which yields 

Corollary 5.3. Let n > 2d. Then it holds that 



(p(u),u G £™ +1 ) G image f n+u 



(p(av),v G S n ) G image f n) j Va G S 
(p(u), v G E n ) G image f nJ 

(77) 



Note that an ideal-theoretic equivalent of theorem 15.31 would yield a proof 
of conjecture 11.9 from [3] as a corollary. However, an ideal-theoretical equiv- 
alent of theorem 15.31 would be a stronger result: 

Conjecture 5.4. Let f Hj ; 6e i/ie unconstrained hidden Markov model for I 
hidden states and strings of length n. Then the maximum degree of the in- 
variants d(n, I) of f nj ; does not increase for n > 21, that is, 

...d(n + 1,1) < d(n, I) < d(n -1,1) < ... < d{2l, I). (78) 

As rf(5, 2) = 1 (see [3], table 11.1 (?)), we would obtain that d(n, I) = 1 for 
n > 5, that is, the ideal of invariants would be generated by linear equations 
exclusively. 



6 The Markov model 

In the following, let (U)SSFs p be induced by Markov chains. That is, 

p(v = a 1 ...a n ) = 7r(ai) JjM ai _ lfli (79) 



i=2 



where ir G MP is a strictly positive vector (with entries not necessarily sum- 
ming up to one in case of a USSF p) and M G M s is a matrix with the 
entries of a row summing up to one. Moreover, in this section, let 



(tt,M) ^ (7r(ai)nr=2 M « l -iaJ^ai...a„eS"- 



(80) 
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be the polynomial map (associated with the Markov model in case of tt, M 
being in accordance with the laws from above) with alphabet £ on strings of 
length n. In the language of string functions and Hankel matrices, we have 
the following theorem. 

Theorem 6.1. A (U)SSF p is associated with a Markov chain iff 

VaGE: dimspan{p m | v G £*} < 1. (81) 
A proof can be found in [17] , for example. 

This can be straightforwardly exploited to obtain invariants of f n j. 

Theorem 6.2. Let (p(v),v G £ n ) be a (unconstrained) probability distribu- 
tion such that n > 2|E| — 1. Then (p(v),v G E n ) lies in the image o/f n ^ = | S | 
if and only if 



det 



(82) 



p(vau) p(wau) 
p{vau') p{wau') 

for all choices u, u', v, w G E*, a G £ such that \vau\, \vau'\, \wau\, \wau'\ < n 
and, as usual, p{y) := J2wez n -\ v \ p( vw ) f or strings v such that \v\ < n. 

Proof. "=^>" is obvious as for a Markov chain p, (1821) is a necessary 
consequence of (!8lT) in theorem 16.11 

"«<=" Clearly, ( 1821) implies the assumptions (1351) and (1371) of theorem 13.11 
to hold, which yields that (p(v),v G E n ) lies in the image of the finite- 
dimensional model. We thus find, by means of algorithm I2.8[ (T a ) ag s,x,y 
such that the probabilities p(v) for all v up to length n > 2|E| can be com- 
puted according to ([6]). Note that T a maps p v onto p va where p v ,p va are 
identified with a coordinate representation induced by the basis of the col- 
umn spaces that one has found according to algorithm 12.81 (see remark I3"l)j) . 
In this sense, fl82l) translates to 



dim image T a < 1 (83) 

for all a G E. Clearly, this implies (IHT!) of theorem 16.11 from which the asser- 
tion follows. o 



Remark 6.3. While the assumption n > 2|E| — 1 helps to give a rather 
concise proof of theorem 16. 2\ we feel that it is not a necessary requirement. 
However, inference of Markov chain parameters giving rise to probability 
distributions (p(v),v G E n ) for which the determinantal invariants (1821) ap- 
ply is a much more technical undertaking. Moreover, it seems that some 
(potentially more involved) pecularities have to be resolved. 
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7 Trace algebras 



In this section, we will draw some connections between trace algebras and 
the theory of finite-dimensional string functions. For a rigorous introduction 
to trace algebras see [9]. We recall that in Bernd's preprint [2] the quartic 
hidden Markov model invariant listed in [3] could be identified as a relation 
between trace polynomials. 

Here, we shall try to shed some light on the general relationships between 
trace algebras and finite-dimensional models. In terms of the language of 
trace algebras, we will derive some defining relations for the trace algebras. 

Therefore, we introduce the following definition. 

Definition 7.1. A string function p : S* —>■ R is called traceable of order r 
if there are matrices X a G W' xr , a G £ such that 

p(v = ax...a n ) = tr X an ...X ai . (84) 

Traceable string functions are finite-dimensional, as can be seen by ap- 
plication of a simple lemma. 

Lemma 7.2. Let p i: i = l,...,k be string functions of dimensions dj. Let 
p := Yli=iPi- Then it holds that 

k 

dimp < di. (85) 
i=i 

This gives rise to 
Theorem 7.3. Let p G M s * be traceable of order r. Then 

dimp < r 2 . (86) 
Proof. Let pi G M s *, i = 1, ...,r be defined by 

Pi(v = a 1 ...a n ) := tr X an ...X ax e i e T i . (87) 
From theorem 12.51 we obtain dim^j < r. As Id = ^ e,ef , which yields 

P = $>* (88) 
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the assertion follows from application of lemma 17.21 



o 



If the identity matrix Id = e i e f was presentable in the form Id = xy T 
itself, traceable string functions of order r would be of dimension at most r, 
as given by theorem 12.51 As this is not the case, there are traceable string 
functions of order r whose dimension is larger than r. Moreover, not every 
string function of dimension r 2 seems to be traceable. However, an example 
of that kind is yet to be delivered. 

The consequences of theorem 17.31 for the theory of trace algebras are 
that invariants which can be computed for the r 2 -dimensional models f njr 2 
also apply as defining relations for the trace algebras generated by all trace 
polynomials 

tr (X in ...X il ),l<i j <d,n>0. (89) 

The exact relationships between trace algebras, hidden Markov as well as 
the finite-dimensional models are yet to be determined. 

8 Open Questions 

1. Theorem 15.11 characterizes hidden Markov chains within the theory of 
finite-dimensional random processes. Determine invariants that corre- 
spond to this characterization. 

2. Determine the relationships between trace algebras and the models 
under consideration here in more detail. 

3. Deliver a proof for a more general version of theorem I6.2[ as discussed 
above. 

4. Determine the peculiarities of differences between the two-dimensional 
models and the hidden Markov models for 2 hidden states. 

5. Tropicalization of Teichmuller spaces (see |2J)? 
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